-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tighten database access in htex/monitoring test to avoid issue #3287 #3293
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
A more principled fix of #3287 looks to be much more deeply invasive, and involves a big rework of how ZMQ is used.
khk-globus
approved these changes
Mar 26, 2024
benclifford
added a commit
that referenced
this pull request
Apr 8, 2024
This is part of work to move JobStatusPoller facade state into other classes, as part of job handling rearrangements in PR #3293 This should not change behaviour: each executor has a single PolledExecutorFacade and a single strategy.ExecutorState, and this PR moves the 'first' field from one to the other. parsl/tests/test_monitoring/test_htex_init_blocks_vs_monitoring.py tests that init_blocks handling still fires properly - that's what is switched by this 'first' field.
benclifford
added a commit
that referenced
this pull request
Apr 8, 2024
This is part of work to move JobStatusPoller facade state into other classes, as part of job handling rearrangements in PR #3293 This should not change behaviour: each executor has a single PolledExecutorFacade and a single strategy.ExecutorState, and this PR moves the 'first' field from one to the other. parsl/tests/test_monitoring/test_htex_init_blocks_vs_monitoring.py tests that init_blocks handling still fires properly - that's what is switched by this 'first' field.
benclifford
added a commit
that referenced
this pull request
Apr 8, 2024
This is part of work to move JobStatusPoller facade state into other classes, as part of job handling rearrangements in PR #3293 This value is constant, in both executors and in PollingExecutorFacade. This PR should not change behaviour
benclifford
added a commit
that referenced
this pull request
Apr 10, 2024
…ugin Before this PR, the DataFlowKernel and each JobStatusPoller PolledExecutorFacade each opened a ZMQ connection to the monitoring router. These connections are not threadsafe, but (especially in the DFK case) no reasoning or checking stops the DFK's connection being used in multiple threads. Before this PR, the MonitoringHub presented a 'send' method and stored the DFK's ZMQ connection in self._dfk_channel, and each PolledExecutorFacade contained a copy of the ZMQ channel code to open its own channel, configured using parameters from the DFK passed in at construction. This PR: * moves the above uses to use the MonitoringRadio interface (which was originally designed for remote workers to send monitoring information, but seems ok here too) * has MonitoringHub construct an appropriate MonitoringRadio instance for use on the submit side, exposed as self.radio; * replaces the implementation of send with a new MultiprocessingQueueRadio which is thread safe but only works in the same multiprocessing environment as the launched monitoring database manager process (which is true on the submit side, but for example means this radio cannot be used on most remote workers) This work aligns with the prototype in #3315 (which pushes on monitoring radio configuration for remote workers) and pushes in the direction (without getting there) of allowing other submit-side hooks. This work removes some monitoring specific code from the JobStatusPoller, replacing it with a dependency injection style. This is part of work to move JobStatusPoller facade state into other classes, as part of job handling rearrangements in PR #3293 This PR will change how monitoring messages are delivered from the submitting process, and the most obvious thing I can think of that will change is how this will behave under load: heavily loaded messaging causing full buffers and other heavy-load symptoms will now behave as multiprocessing Queues do, rather than as ZMQ connections do. I have not attempted to characterise either of those modes.
benclifford
added a commit
that referenced
this pull request
Apr 11, 2024
…ugin (#3338) Before this PR, the DataFlowKernel and each JobStatusPoller PolledExecutorFacade each opened a ZMQ connection to the monitoring router. These connections are not threadsafe, but (especially in the DFK case) no reasoning or checking stops the DFK's connection being used in multiple threads. Before this PR, the MonitoringHub presented a 'send' method and stored the DFK's ZMQ connection in self._dfk_channel, and each PolledExecutorFacade contained a copy of the ZMQ channel code to open its own channel, configured using parameters from the DFK passed in at construction. This PR: * moves the above uses to use the MonitoringRadio interface (which was originally designed for remote workers to send monitoring information, but seems ok here too) * has MonitoringHub construct an appropriate MonitoringRadio instance for use on the submit side, exposed as self.radio; * replaces the implementation of send with a new MultiprocessingQueueRadio which is thread safe but only works in the same multiprocessing environment as the launched monitoring database manager process (which is true on the submit side, but for example means this radio cannot be used on most remote workers) This work aligns with the prototype in #3315 (which pushes on monitoring radio configuration for remote workers) and pushes in the direction (without getting there) of allowing other submit-side hooks. This work removes some monitoring specific code from the JobStatusPoller, replacing it with a dependency injection style. This is part of work to move JobStatusPoller facade state into other classes, as part of job handling rearrangements in PR #3293 This PR will change how monitoring messages are delivered from the submitting process, and the most obvious thing I can think of that will change is how this will behave under load: heavily loaded messaging causing full buffers and other heavy-load symptoms will now behave as multiprocessing Queues do, rather than as ZMQ connections do. I have not attempted to characterise either of those modes. Co-authored-by: Kevin Hunter Kesling <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
A more principled fix of #3287 looks to be much more deeply invasive, and involves a big rework of how ZMQ is used.
Type of change